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ABSTRACT 


The  foundations  for  non-parametric  probability  estimation  are  pre¬ 
sented  for  random  processes  in  discrete  time.  The  estimators  considered 
are  the  empirical  distribution  function  and  the  amplitude  histogram.  It  is 
shown  tnat  if  the  process  is  strictly  stationary  and  satisfies  a  mixing 
condition,  the  estimators  are  unbiased  and  consistent.  Expressions  for  the 
variance  and  covariance  of  these  estimators  are  presented  and  the  effect  of 
the  correlations  on  these  quantities  is  discussed.  This  effect  is  demon¬ 
strated  numerically  by  simulations  of  a  Gaussian  process.  A  theorem  is 
established  which  demonstrates  the  monotonic  relation  between  the  variance 
and  the  correlations  for  Gaussian  processes.  This  follows  from  a  corre¬ 
sponding  property  of  the  bivariate  Gaussian  distribution. 


v 


1. 


INTRODUCTION 


A  common  problem  in  many  areas  of  science  is  the  treatment  of  cor¬ 
related  data.  Classical  statistical  methods  which  assume  independence  of 
observations  (and  this  is  the  great  body  of  statistics)  are  generally  not 
applicable.  Thi6  research  addresses  one  aspect  of  this  larger  problem, 
that  of  probability  estimation  with  correlated  data. 

This  problem  arises  in  many  diverse  fields  where  models  based  on  time 
series  or  stochastic  processes  are  used.  In  some  of  these  areas,  a  non- 
parametric  estimate  of  the  univariate  probability  is  of  intrinsic  interest. 
An  area  of  special  importance  is  that  where  the  model  is  a  stationary 
Gaussian  random  process.  This  importance  derives  from  the  parametric 
simplicity  of  Gaussian  processes  and  from  the  fact  that  many  techniques 
assume  or  require  Gaussianity  for  their  validity.  This  is  especially  true 
of  many  signal  processing  techniques.  For  example,  Gaussianity  is  usually 
assumed  in  the  smoothing,  prediction  and  signal  extraction  problems  and  is 
required  for  the  equivalence  of  maximum  likelihood  and  least  squares 
methods  of  estimation.  Given  the  current  popularity  of  maximum  entropy 
methods  (MEM)  of  spectral  estimation,  it  is  worth  pointing  out  that  such 
estimates  maximize  the  entropy  only  for  Gaussian  processes. 

Thus,  there  is  ample  motivation  for  the  characterization  of  proba¬ 
bility  estimation  based  on  finite  samples  of  random  processes.  Although 
there  has  been  exhaustive  work  in  such  estimation  for  independent  variables 
(see.,  e.g.,  Wegman  (1972)).  little  has  been  done  for  random  processes. 
Patankar  (1954)  and  Thrall  (1965)  seem  to  be  the  only  authors  to  have 
addressed  this  problem.  The  related  problem  of  hypothesis  testing  for  the 
Gaussianity  of  a  stochastic  process  has  received  more  attention  (Persson 
[1974],  Gasser  [1975],  Weiss  [1978]).  Most  of  this  latter  work  has  been 
motivated  by  the  particular  statistical  requirements  involved  in  the 
analysis  of  the  electroencephalogram.  (See  also  McEwen  and  Anderson  [1975], 
Saunders  [1963],  Elul  [1969].)  Since  most  of  these  authors  use  test 
statistics  constructed  from  univariate  distribution  estimates  (i.e.,  the 
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Kolmogorov-Smirnov  or  Chi-squared  tests),  it  i6  surprising  the  foundation 
for  this  distribution  estimation  has  not  been  more  thoroughly  explored. 

Ia  this  work,  we  will  investigate  the  estimation  of  the  univariate 
distribution  function  based  on  a  finite  sample  of  a  stochastic  process.  We 
will  restrict  our  attention  to  strictly  stationary  random  processes  defined 
on  discrete  time.  The  estimators  that  will  be  treated  are  the  empirical 
distribution  function  and  the  amplitude  histogram.  We  will  first  establish 
conditions  for  these  estimators  to  be  unbiased  and  consistent.  The 
variance  and  covariance  of  these  estimators  will  be  presented  and  it  is 
here  that  the  presence  of  correlations  manifests  itself  most  strikingly. 
The  effect  of  correlations  on  these  variances  will  be  investigated  in  more 
detail  for  the  particular  case  of  Gaussian  processes.  Simulations  of  a 
Gaussian  process  wil  be  performed  to  numerically  demonstrate  the  effect  of 
correlations.  These  numerical  results  will  then  be  generalized  by  a 
theorem  which  shows  that,  for  Gaussian  processes,  these  variances  are 
monotonically  related  to  the  correlations . 
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2. 


SOME  PROPERTIES  OF  THE  ESTIMATORS 


In  this  section,  we  will  develop  some  of  the  statistical  properties  of 
the  estimators  required  by  the  Kolmogorov-Smirnov  and  Chi-squared  tests. 
These  estimators  are,  respectively,  the  empirical  distribution  function  and 
the  so-called  amplitude  histogram  or  distribution.  These  are  closely 
related.  Parzen  (1962a)  has  established  these  properties  and  pursued  them 
into  the  realm  of  non-parametric  probability  estimation  for  the  case  of 
independent,  identically  distributed  variables.  Thrall  (1965)  has 
published  some  of  the  following  results  for  the  case  when  correlations  are 
present,  but  also  dealt  primarily  with  independent  observations. 

Suppose  we  have  n  successive  observations  of  a  stationary  stochastic 
process  X(t)  defined  on  the  integers.  We  can  assume,  by  stationarity ,  that 

the  observations  are  for  times  1,2 . n.  We  form  the  sample  or  empirical 

distribution  function,  Fn(a)  as  follows; 

n 

L  la  (X(i)) 
i-1 

where  Ifl(.)  is  the  indicator  function  of  the  set  [-on  ,a]  defined  by: 

[1  if  x  <a 
0  if  x>a. 

It  is  trivial  to  show  that  stationarity  implies  that 

E[Fn(a)]  ■*  F(a) 

where  F(.)  is  the  univariate  distribution.  It  should  be  emphasized  that 
this  unbiasedness  holds  even  in  the  presence  of  correlations  or 
dependencies  between  variables. 


Fn(a)  «  - 


Next,  we  wish  to  evaluate  variances.  We  have: 


n 

Var[Fn(a)]  *  Var£  ^  ^  Ifl(X(i))  ] 

1 

1  n 

“  ~2  Var[la(X(i))] 

n  1 

n  n 

+  ]T  5^Cov[Ia(X(i)),Ia(X(j))]  j. 

1  -  j 

But  we  also  have: 

Var[ I  (X( i)) ]  =  F(a)ll  -  F(a)] 

Upon  substitution,  we  get: 

Var[F  (a)]  =  -  F(a)[l-F(a)J 
n  n 

n  n 

II  Cov[ l(X(i)),I(X(j))J. 

L  ♦  A  “  “ 

n  i  t  J 

Note  that  this  expression  depends  on  the  covariance  of  the  random  variable 
I  (X(i)),  which  is  a  binary  (i.e.,  0  or  1)  stochastic  process.  If  the 
variables  X( 1 )  and  X(j)  are  independent,  then  the  covariance  vanishes  and 
the  variance  reduces  to: 

Var[Fn(a)]  =  -  F( a ) [ 1 -F(a ) ] . 
n 

This  expression  is  easily  understood  because,  in  the  case  of  independence, 
I  (X(i))  is  a  simple  Bernoulli  variable  with  parameters  p*=F(a)  and 
q“[l-F(a)].  In  the  more  general  case,  our  variance  is  in  the  form  of  the 
sum  of  two  terms:  the  first  is  for  independent  variables  and  the  second  is 
the  correction  for  dependence.  Because  of  stationarity,  this  can  be 
further  simplified.  Stationarity  implies  that  for  all  integers  h, 

4 


Cov[Ia(X(t)),Ia(X(S))]  =  Cov[lfl(X(t+h)),Ia(X(s+h))). 


Thus,  the  double  sum  can  be  transformed  and  we  obtain  the  same  form  for  the 
variance  as  obtained  by  Thrall  (1965): 

Var[Fn(a)]  -  £  F(a)[l-F(a)l 

n 

+~  5"  (1-  i)CovlI  <X(0),I(X(i>)]. 
i  =  l 

Parzen  (1962b)  gives  the  following  theorem  which  etates  conditions  for 
Fn(a)  to  be  a  consistent  estimator  of  F(a). 

Theorem  1  (Parzen) :  Var[Fn(a)J  converges  to  zero  if  and  only  if 
CovlFn(a) ,Ia(X(n))]  converges  to  zero. 

This  theorem  supplies  the  conditions  necessary  for  a  stochastic 
process  to  be  ergodic  for  the  distribution  function,  i.e.,  for  the 
time-averaged  sample  distribution  function  (the  empirical  distribution 
function)  to  be  equal  to  the  ensemble  distribution  function.  This 
condition  is  similar  to  an  a  -mixing  condition  (see,  e.g.,  Billingsley 
[1979]),  which  means  heuristically  that  variables  that  are  temporally  far 
apart  are  effectively  independent. 

Since  we  are  interested  in  generalizing  the  estimation  framework 
developed  for  independent  variables,  the  notion  of  Q-mixing  i6  especially 
useful  as  a  generalization  of  the  notion  of  independence.  We  next 
establish  the  consistency  property  in  terms  of  this  mixing  condition. 

Corollary  1.1:  If  the  stationary  stochastic  process  X(t)  satisfies  an 
a  -mixing  condition,  then  Var[Fn(a)]  converges  to  zero. 
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r  - 


Proof :  By  definition  of  a  -mixing,  we  have  that  there  exists  a  sequence 
n  j  such  that  0  £  a j  <  1  for  all  j  and  eu  converges  to  zero  and 


-  Oj  <  P(X(0)<a),X(j)<a)  -  P(X(0)<a)P(X( j )<a)  <  Bj 
for  any  real  a.  Thus,  after  a  little  algebra  with  the  indices,  we  get: 


Q  j  <  CovlFn(a),lfl(X(n))] 


a 


j' 


Now,  the  right  and  left  sides  are  Cesaro  sums  of  the  sequence  a  ■ .  If  we 

define  C  by 
n  1 


n 


1 


then  it  is  easily  shown  (see,  e.g.,  Fuller  [1976])  that  if  converges 

to  zero,  so  does  C^,  Thus,  we  have  that: 


Cov[Fn(a),Ia(X(n))]  -  0 
ano  so  by  Theorem  1,  Var[Fn(a)l  — *  0. 


Q.E.D. 

Now  let  us  turn  to  the  covariance  of  the  empirical  distribution  function. 
Let  a  and  b  be  real  numbers  and  assume  that  a<b.  After  manipulating  the 
sum  and  using  stationarity ,  the  covariance  may  be  written  as: 
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Cov(Fn(a),Fn(b))  =  i  F(a)[l-F(b)] 


+  ~y  (l—)[P(X(i)<a,X(0)<b)-F(a)F(b) 
n  z— ,  n 

1 

+  P(X(0)_<a,X(i)<b)-F(a)F(b)] 

The  first  term  is  that  for  the  case  of  independence  and  the  second  is  the 
correction  for  dependence.  We  also  have  the  following  corollary  whose 
proof  is  omitted,  but  is  very  similar  to  the  previous  proof. 

Corollary  1.2:  If  the  stationary  stochastic  process  X(t)  satisfies  q  -mix¬ 
ing,  then  Cov( F^(a ) , Fn(b) )  converges  to  zero  for  anya  and  b. 

Let  us  review  our  results.  We  assume  that  we  have  n  successive 
observations  of  a  stationary  stochastic  process  defined  for  discrete  time 
and  this  process  satisfies  an  a -mixing  condition.  We  have  seen  that  the 
empirical  distribution  function  is  unbiased  and  consistent  as  an  estimator 
of  the  cumulative  distribution  function.  However,  the  variance  and 
covariance  depend  strongly  on  the  form  of  the  dependence  between  variables. 
Th°  variance  is  a  measure  of  the  average  rate  of  convergence  (in  mean 
square)  of  the  estimator  F^(a)  to  F(a).  Thus,  we  have  that  the  rate  of 
mean  square  convergence  at  a  point  depends  on  the  form  of  the  dependence 
between  variables.  This  perspective  will  be  pursued  in  the  following 
sect  ion . 

Let  us  next  introduce  the  amplitude  histogram.  Define  the  estimator 
of  the  amplitude  histogram  by: 

fn(a,b)  =  Fn(b)  -  Fn(a) 

where  it  is  assumed  that  b>a.  We  clearly  have: 
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Elf n(a,b)J  -  F(b)  -  F(a). 

The  histogram  estimator  is  usually  used  to  construct  an  estimator  of  the 
probability  density  by  dividing  it  by  the  interval  length  (b-a).  From  the 
above,  we  see  that  it  is  a  biased  estimator  of  the  density.  The  difficulty 
here  stems  from  the  fact  that  the  density  is  the  derivative  of  the 
distribution  function. 

The  above  bias  has  led  several  authors  (Parzen  [1962a],  Leadbetter  and 
Watson  [1961])  to  use  so-called  kernel  estimates  for  the  density  (Wegman 
[  1972]).  The  problem  of  bias  will  be  circumvented  by  considering  the 
riistogram  as  an  estimator  of  a  theoretical  amplitude  histogram,  f(a,b), 
del ined  by  : 

f ( a , b)  -  F(b)  -  F(a) . 

The  variance  of  this  estimator  i6  easily  found  from  our  previous 
results.  It  is: 

VarUn(a,b)  -  -  [F(b)-F(a)-(F(B)-F(a))2 

n 

n 

+  H  (1—  )[P(X(0)<a,X(i)<a)-F(a)2+P(X(0)<b.X(i)<b) 

i  n 
i=l 

-F(b)2-P(X(0)<a,X(i)<b)-P(X(0)<b,X(i)<a) 

+  2F(a)F(b) ] ] 

Note  that  again  we  have  a  term  for  the  independent  case  and  another  for  the 
correction  due  to  dependence.  The  covariance  of  two  such  estimators,  say 
fnta,b)  and  fn(c,d),  may  also  be  written  down  in  a  straightforward  manner, 
but  we  will  not  do  so  here.  We  immediately  have  the  following  corollary 
which  is  a  simple  extension  of  earlier  results. 
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Corol lary  1.3:  If  the  stationary  stochastic  process  X(t)  defined  for 
integer  time  satisfies  a  -mixing,  then 

Var(fn(a,b))  —  0  for  any  a  and  b. 

We  shall  end  this  section  with  a  brief  discussion  of  error  limits  on 

the  estimator  F  (a).  The  variance  of  F  (a)  and  the  covariance  of  F  (a)  and 
n  n  n 

F^Cb)  are  characterized  by  their  dependence  on  the  first  and  second  order 
distributions  of  X(t).  As  noted  earlier,  we  may  consider  the  indicator 
function  as  generating  a  binary  process.  From  this  perspective,  the 
variance  and  covariance  depend  on  the  covariance  function  of  the  binary 
process  Ia(X(t)).  In  the  case  of  independence,  these  covariances  vanish 
and  the  variance  of  F^fa)  can  be  estimated  in  a  straightforward  fashion 
from: 

Var[Fn(a)]  *  ^n(a)ll  -  Fn(a)]. 

For  the  case  of  dependence,  no  such  simple  procedure  is  possible.  The 
analogous  procedure  would  require  estimates  of  the  second  order  distribu¬ 
tion,  which  is  a  computationally  formidable  task,  and  one  which  requires 
very  large  amounts  of  data.  However,  a  variance  estimate  can  be  obtained 
by  dealing  with  the  generated  binary  process  I  (X(t)).  The  covariance 
function  of  Ia(X(t))  can  be  estimated  using  standard  techniques.  This 
method  is  amenable  to  computational  streamlining  by  the  techniques  of 
calculating  autocovariances  by  the  Fast  Fourier  Transform  (FFT)  algorithm 
(see,  e.g.,  Oppenheim  and  Schafer  [1975]).  The  covariance  function  of 
I  (X(t))  can  then  be  substituted  into  the  expression  for  the  variance  of 
F  (a)  to  yield  the  desired  estimate.  Such  an  approach,  although  an 
improvement  over  direct  estimation  of  the  second  order  distribution,  would 
not  be  practical  due  to  the  complexity  and  time  requirements  of  the  compu¬ 
tations. 
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THE  VARIANCE  OF  THE  ESTIMATORS  FOR  GAUSSIAM  PROCESSES 


In  this  section,  some  properties  of  the  variance  of  the  estimators 
Tn(a)  and  fn(a,b)  will  be  explored.  We  are  motivated  to  treat  the  simplest 
case.  Gaussian  processes  are  well  known  for  their  mathematical  tractabil- 
ity.  The  primary  property  of  Gaussian  processes  which  is  important  here, 
indeed,  it  is  indispensable,  is  the  equivalence  between  the  notions  of 
independent  and  uncorrelated. 

Let  X^  and  X2  be  random  variables  that  possess  the  bivariate  normal 

5 

distribution  with  means  0,  variances  a  ,  and  correlation  p  .  Then: 

a  b 

P(X.<a,X?<b)  =  f  f  - 77 - r  exp  [  - t— ^ 7— (x2+y2-2  pxy)]dxdy. 

2*aVl-p2  2  0  2(  1-  p 2) 

The  double  integral  cannot  be  expressed  in  closed  form,  i.e.,  as  a 
function  of  p  .  It  can,  however,  be  evaluated  numerically  and  tabulated 
(.see,  e.g.,  Abramowitz  &  Stegun  [1964]).  Some  of  these  values  are  listed 
in  Table  1  for  the  cases  a*=b=0,  a=b=0.5,  and  a^O,  b=0.5  These  were  evalu¬ 
ated  numerically  using  a  "double"  8-point  Gaussian  integration  method,  that 
is,  the  inner  integral  was  evaluated  at  the  Gaussian  points  (determined  by 
the  outer  integral)  by  an  8-point  formula.  These  values  allow  us  to  get 
some  numerical  estimates  for  the  variance  and  covariance  of  the  estimator 
Pf(a)  in  certain  circumstances  and  so  get  a  qualitative  understanding  for 
the  effects  of  correlation. 
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p 

P{X<0,Y<0) 

P(X<0,Y<0.5) 

P(X<0.5,Y<0 

-.90 

0.0718 

0.2031 

0.3893 

-.80 

0.1024 

0.2222 

0.3981 

-.70 

0.1266 

0.2403 

0.4083 

-.60 

0.1476 

0.2572 

0.4192 

-.50 

0.1667 

0.2731 

0.4305 

-.40 

0.1845 

0.2884 

0.4405 

-.30 

0.2015 

0.3031 

0.4421 

-.20 

0.2180 

0.3175 

0.4539 

-.10 

0.2341 

0.3317 

0.4659 

0.0 

0.2500 

0.3457 

0.4781 

.10 

0.2659 

0.3598 

0.4907 

.20 

0.2821 

0.3740 

0.5036 

.30 

0.2985 

0.3884 

0.5171 

.40 

0.3155 

0.4031 

0.5312 

.50 

0.3333 

0.4183 

0.5462 

.60 

0.3524 

0.4343 

0.5624 

.70 

0.3734 

0.4512 

0.5805 

.80 

0.3976 

0.4693 

0.6015 

.90 

0.4282 

0.4884 

0.6283 

Table  1 :  Values  of  bivariate  GausBian  probabilities  with  means  0,  variances 
1  and  correlation  p  . 
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The  only  case  we  can  readily  compute  ia  the  case  where  the  Gaussian  process 
satisfies  the  equation: 


X(t)  -  — ^  [a(t )  +  0a(t-l)I 

1+  ez 

where  the  sequence  of  variables  a(t)  are  independent,  standard  normal 
variables  (i.e.,  a(t)  "v  N(0,1)).  Thu6,  X(t)  has  a  simple  moving  average 
form  (Box  &  Jenkins  [1976])  such  that: 


E[X(t))  -  0 

Cov[X(s),X(t)]  - 


L  0  if  Js-tj  >  1 

Note  that  the  correlation  or  dependence  is  only  between  adjacent  values. 
Note  also  that  the  defining  equation  is  normalized  so  that  both  X(t)  and 
a(t)  have  unit  variance. 


If  n  is  reasonably  large,  we  may  write  the  variance  of  Fn(a)  for  this 
model  as: 

nVar[Fn(a)]  *  F(a)Il-F(s>]  ♦  2 IP  (X^a.X^aJ-FU)2] 

which  has  a  right  hand  side  which  is  independent  of  n.  To  compare  with  the 
case  of  independence,  we  may  calculate  the  ratio  of  this  variance  to  the 
variance  for  the  case  of  independent  variables: 

F(a)I l-F(a)]  +  2  lP(XQ<a,X1<a)-F(a)2J 


Table  2  exhibits  values  of  nVar[Fn(a)]  for  correlations  in  the  range 
(-.5, .5).  This  range  is  used  because,  as  is  easy  to  show  by  standard 
calculus  techniques,  the  correlation,  — jy — y2  ,  achieves  a  maximum  of  0.5 
(  6  =  1.0)  and  a  minimum  of  -0.5  (  6  =-1.0)  for  the  first  order  moving 
average  model.  Note  that  Table  2  uses  the  values  from  Table  1.  Finally, 
to  accentuate  the  effect,  a  column  is  included  which  lists  the  percent 
change  from  the  independent  case. 

Tables  1  and  2  exhibit  some  interesting  properties  that  are  the  basis 
for  several  of  the  following  theorems.  First,  Table  1  shows  that 
P(X^a,X2<b)  is  a  monotonically  increasing  function  of  the  correlation  for 
certain  values  of  a  and  b.  That  this  property  holds  for  any  a  and  b  will 
be  established  in  Lemma  1.  Similarly,  Table  2  shows  that  nVar[Fn(a)] , 
nVar[Fn(b)]  and  nCov[Fn(a) ,Fn(b) ]  also  seem  to  be  monotonically  increasing 
functions  of  the  correlation.  This  result  will  also  be  generalized  in 


Theorem  2. 


a)  P  nVartFn(0))  R 


-0.5 

0.08 

0.32 

-0.4 

0.12 

0.48 

-0.3 

0.15 

0.60 

-0.2 

0.19 

0.76 

-0.1 

0.22 

0.88 

0.0 

0.25 

1.00 

0.1 

0.28 

1.12 

0.2 

0.31 

1.24 

0.3 

0.35 

1.40 

0.4 

0.37 

1.48 

0.5 

0.41 

1.64 

b)  P 

nVarlFn(.5)] 

R 

-0.5 

0.12 

0.55 

-0.4 

0.14 

0.65 

-0.3 

0.15 

0.66 

-0.2 

0.16 

0.77 

-0.1 

0.19 

0.89 

0.0 

0.21 

1.00 

0.1 

0.24 

1.12 

0.2 

0.26 

1.24 

0.3 

0.29 

1.37 

0.4 

0.32 

1.50 

0.5 

0.35 

1 .64 

c)  p 

nCov[Fn(0),Fn(.5)] 

R 

-0.5 

0.009 

0.02 

-0.4 

0.04 

0.26 

-0.3 

0.07 

0.45 

-0.2 

0.10 

0.63 

-0.1 

0.13 

0.84 

0.0 

0.15 

1.00 

0.1 

0.18 

1.18 

0.2 

0.21 

1.36 

0.3 

0.24 

1.55 

0.4 

0.27 

1.75 

0.5 

0.30 

1.94 

Table  2: 

Values  of  nVar[Fn(0)],  nVarlF 

n<*5> 

X  change 


-68 

-52 

-40 

-24 

-12 

0 

12 

24 

40 

48 

64 


Z  change 


-45 

-35 

-33 

-23 

-11 

0 

12 

24 

37 

50 

64 


Z  change 


-94 

-74 

-55 

-36 

-16 

0 

18 

36 

55 

75 

94 


and  nCov[Fn(0) ,Fn( . 


various  values  of  correlation  for  a  first  order  moving 


Gaussian  process. 
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5)]  for 
average 


It  is  worth  pointing  out  that  Table  2  is  constructed  from  a  very 
simple  model,  one  which  has  correlations  only  between  nearest  "neighbors." 
The  construction  of  a  similar  table  for  even  a  simple  first  order  autore¬ 
gressive  (i.e.,  Markov)  process  would  involve  extensive  calculations  and 
numerical  integration.  However,  Table  2  shows  the  essential  features  that 
the  presence  of  even  small  correlations  between  only  adjacent  times 
produces  large  changes  in  the  variance  of  and  the  covariance  between 

F  (a)  and  F  (b).  Thus,  a  correlation  of  only  0.1  or  -0.1  will  produce  a 
n  n 

change  of  about  122  in  the  variances  of  Fr( 0)  and  Fo(0.5)  and  about  a  16% 
change  in  their  covariance  regardless  of  n. 

Let  us  now  proceed  to  generalize  these  observations. 

Le™™a  1.1;  Let  R  be  the  bivariate  Gaussian  cumulative  distribution  func¬ 
tion  evaluated  at  (a,b): 

H  (  P  )  =  C  fb  1  exp(  -1  (x2+y2-2  pxy)]  dx  dy 

~oo  -co  2*  */ \-  p  ^  2(1-  P  2) 

where  unit  variance  is  assumed.  Then  R  is  a  monotonically  increasing  func 
tion  of  p  on  the  interval  (-1.0, 1.0).  Furthermore,  its  derivative  is 

given  by: 

dR  =  1 _  exnf  -1  (a2+b2-2  p  ab)]. 

ip  2  nV  1-  p  2  2(1-  P  2) 

Proof:  Define: 


P  =  P(c<X1<a,d<X2<b) 

a 

=  f  J  f(x1,x£)  dXj  dx2 

where  f(.,.)  is  the  bivariate  Gaussian  density  with  means  0,  variances  1, 
and  correlation  p  .  We  have  by  definition  of  the  characteristic  function 


ao  ao 


f  (x, 


i,.x„)  =  1  „  f  f  exp ( -it 'x)exp( - t'Vt)  dt,  dt2 

1  2  JJT)2  J  J  2 

_  oo  —  oo 

where  V  is  the  covariance  matrix  of  the  2-vector  x»(xj,x2)  and  t  i6 
2-vector,  t=(t^,t2).  Then  after  substituting,  we  get: 

a  b  co  ao 

r  err 

p  = 

j 

-OO  -CO 


a  b  oo  ao 

f  fff  — ~r  exp(-it  *x)exp(-l/2t  ’Vt)  dt^  dt2  dx^  dx2 

c  <r  ri  (2  *)2 


Interchanging  the  order  of  integration  yields: 


oo  ao 


a  b 


-  CO  -oo 

OO  CO 


(2  *  r  c  d 


J  exp(-it'x)  exp(-l/2t 'Vt)  dx^  dx2  dtj  dt2 


exp(-l/2t'Vt)  fe~xtlc  -  e"1*-)8] 
iCl 


t e~xt 2d  -  e~it:2b]  dt1dt2 
it„ 


_1 _  e-l/2t'Vt  je~i(t1c+t2d)  ..  g-Ktje+t^) 


1 1 1 2(  2  *  V 


—  OO  -oo 


_e-i(t1a+t2d)  +  ^-iftja+tjbJj  dtj  dt2 


Now , 


[i  ;] 


t'Vt  -  t2  -  t2  ♦  2ptlt2 


Thus,  taking  derivatives  inside  the  integral  gives: 
ao  oo 

dP  -  #  I  _ L 


dp 


// 


a-l/2tVt  je-i(t1c+t2d)  _  e-i(tjc+t2b) 


—  oo  — oo 


(2  TT  )‘ 


-  e"i(tla+t2d)  ♦  e-i(t1*+t2b)J  dti  dt_2 
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But  this  integral  is  just  the  algebraic  sum  of  four  bivariate  normal  densi¬ 
ties,  that  is: 

dP  -  f(c,a;  +  f (a, b)  -  f(c,b)  -  f(a,d) 
dp 

Now  if  we  take  limits  as  c  — >  -oo  and  d  —  -co  ,  then  P  — >  R(  P  )  and  so: 

dR  «  dP  =  f (a,b)  =  - i - -  exp[ - ^ — —  (a2+b2-2p  ab)] 

dp  dp  2ff\/l-p2  2  (1-P2) 

Clearly,  dR  is  greater  than  0  for  any  p  t  (-1,1)  and  for  any  a  and  b. 
dp 


Q.E.D. 


As  a  result  of  this  lemma,  we  may  write  the  indefinite  integral: 


I(a 


,b>  -  / 


dR 

dp 


dp 


/ 


exp[ - - — — 1 (a2+b2-2  p  ab)]  dp 


2ttVi-P2  2(1-P2) 


If  a=b“0,  we  get: 


1(0,0) 


1 

2ity/\-P  2 


dp 


which,  upon  integrating,  yields  Sheppard's  theorem  on  median  dichotomy 
(Kendall  &  Stuart,  Vol.  1,  [1969]). 


We  are  now  prepared  to  prove  our  central  result. 


17 


Theorem  2:  Let  X(t)  be  a  stationary  Gaussian  process  defined  for  integer 
time  with  mean  0  and  variance  1.  Let  p ^  be  the  correlation  between  X(t) 
and  X(t+i).  Then,  Var[FQ(a)]  is  a  monotonically  increasing  function  of  pj 
for  p^  e  (-1,1).  Furthermore,  we  have  that: 


3  Var 

a  p 


i 


hi-  h 

n  n 


1 

2*^/1-  p 


2  exp[ 
i 


2 

-a 

1+  p 


) 


Proof :  We  have: 


Var[Fn(a)] 


[  1-F(a )]  +  (1-  i)[P(X(0)<a,X(i)<a)  -  F(a)2) 

n  Lu  n  —  — 

1 


Let  the “variance  (unity)  and  fi  ^  be  maintained  constant  for  j  /  i.  Then: 

~H^lFn( »)]  *  hi-  h  “  (X(0)<a,X(i)<a) 
vpn  nn0n  — 


By  Lemma  1  and  since  i<n,  we  have: 
a  Var 


a  p 


IF  («)]  -  -(1-  i)  - - - -  exp[— — 

n  n  n  2W1-P2  1  + 


-a 


This  is  non-negative  for  any  a  and  for  p^  *  (-1,1), 


Q.E.D. 

We  also  have  a  monotonicity  property  for  the  covariance  between  F  (a) 
and  *n(b): 


Theorem  3:  Let  X(t)  be  a  stationary  Gaussian  process  defined  for  integer 
time  with  mean  0  and  variance  1.  Let  p  .  be  the  correlation  between  X(t) 
and  X(t+i)«  Then  CovlFn(a) ,Fn(b)]  is  a  monotonically  increasing  function 
of  p  ^  for  P^  «  (-1,1).  Furthermore,  we  ha\e: 
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-1 


f^[F  (a),F  (b)]  -  f(l-  j) 
on  **  n  n  n 


2irVl- 


,exPl 


(a2+b2-2 


2(1- 


) 


P  £»b) ] 


Proof:  For  ease  of  notation,  let  C  -  Cov[Fn(a) ,Fn(b)] .  Let  a<b,  clearly 

without  loss  of  generality.  We  have,  by  an  earlier  result: 

I  ,  n 

C  -  -F(a)  [ l-F(b) ]  ♦  i  y  (1-i)  [P(X(i)<a,X(0)<b)-F(a)F(b)] 

II  n  n 

i 

i  n 

+  7  Y  (1-  -)  [P(X(OXa,X(i)<b)-F(a)F(b)] 
n  z— t  n  ~ 

1 


If  we  maintain  the  variance  (“I)  and  p^  constant  for  j  /  i,  we  get: 


““  “  ~(1-  -)  ~^(X(iXa,X(0Xb)  +  hi-  i)  ~  (X(0Xa,X(iXb) 
dp.  n  n  dp.  —  ~  n  n  dp.  ~  — 


“  ~(1“  TT  (X(0)<a,X(i)<b) 
u  n  d  P  • 


This  last  step  follows  from  stationarity  (  p_^)  an<5  From  the  symmetry 

of  the  bivariate  Gaussian  density.  By  Lemma  1,  this  becomes: 


-  exp[ -  (a2+b2-2  p  .ab)] 

2ttV1-P  2(1-  p  Z) 

i  i 


The  monotonicity  now  follows  from  Lemma  1. 


Q.E.D. 

It  is  tempting  to  immediately  extrapolate  this  monotonicity  property 
of  Var[Fn(a)]  to  Var[fn(a,b)] .  We  have: 

Var[fn(a,b)]  »  Var[Fn(a)]  +  Var[Fn(b)]  -  2CovlFn(a) ,Fn(b) ] 
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After  substituting  the  results  of  the  previous  two  theorems,  we  get: 


d  Var [f  (a,b) ]  --(1-  -) - - - (exp(— - )  *  exp(-r - ) 

57i  n  n  n  p  ;  1- Pi  i**i 


-2exp( - - -  (a2+b2-2  P  .ab))] 

2(1-  P  i>  1 


Let  G  be  the  term  in  Drackets.  Now 


?  ?  222222 
a  +  b  -  2  p  a  b  **  a  -  p  a  +  b-  pb+pa+pb-2pab 


“  a2(l-p)  +  b2(l-p)  +  p  (b-a)2 


So,  G  can  be  written  as: 

2  2  2  2  2 

G  =  exp(—‘ — )  +  exp(-  ~-b— )  -  2exp( - 5 - )exp( — — - )exp( — ^b 

1+p  1+  P  2(  1+  p  )  2(  1+  p  )  2(l-p/) 


where  the  subscript  i  has  been  omitted  for  simplicity.  Now  if  p  >0, 

.  -  P(b-a)2s  .  , 

2(1-  P2) 


So , 

2  2  2  2 

G  2  exp(-  ~a-  )  +  exp( - )  -  2exp( - - - )exp( - b - ) 

1+P  1+P  2(  1+  p  )  2(1+  P  ) 


c  [exp(  -a2 )  -  exp (  -b-^-  )  ]  2 

2(1+  P  )  2( 1+  p  ) 


>  0 


then 
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Thus,  we  have  established  the  following  corollary: 


Corollary  3.1:  Let  X(t)  be  a  stationary  Gaussian  process  defined  for 

integer  time  with  mean  0  and  variance  1.  Let  P^  be  the  correlation 
between  X(t)  and  X(t+i).  Then  Var[fn(a,b)]  is  a  monotonically  increasing 
function  of  p^  for  p ^  *  [0,1]. 


It  is  worth  emphasizing  the  difference  between  this  corollary  and  the 

previous  theorems.  The  distinction  lies  in  the  range  of  the  correlation: 

Varlf^Ca.b)]  is  monotonic  only  for  non-negative  corrections,  while 

Var[F  (a)]  is  monotonic  for  all  correlations, 
n 
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DISCUSSION' 


We  have  presented  a  framework  for  probability  estimation  from 
stochastic  processes  that  is  a  generalization  of  that  from  independent, 
identically  distributed  variables.  Thus,  the  requirement  of  stationarity 
is  an  extension  of  that  of  being  identically  distributed.  Similarly,  the 
requirement  of  a -mixing  is  an  extension  of  the  notion  of  independence; 
indeed,  a -mixing  implies  asymptotic  independence  and  is  thus  a  satisfying 
generalization.  We  have  seen  that,  under  the  conditions  of  stationarity 
and  a -mixing,  the  empirical  distribution  function  is  an  unbiased  and 
consistent  estimator  of  the  univariate  cumulative  distribution  function. 
The  same  applies  to  the  empirical  amplitude  histogram  as  an  estimator  of 
the  theoretical  amplitude  histogram. 

We  have  shown  that  the  presence  of  correlations  is  manifested  in  the 
variance  and  covariance  of  the  estimators.  Specifically,  these  variances 
and  covariances  have  been  expressed  as  a  sum  of  two  terms:  the  first  is 
simply  that  for  independent  variables,  while  the  second  is  the  correction 
due  to  dependence.  By  s  simulation,  we  showed  numerically  that  even  small 
correlations  may  have  a  pronounced  effect  on  the  variance  and  covariance. 
A  monoton ic ity  property  of  the  bivariate  Gaussian  distribution  was  used  to 
characterize  the  effect  of  correlations  for  Gaussian  processes.  It  was 
established  that  the  variances  and  covariances  are  monotonica 1 ly  related  to 
the  corielation6  for  Gaussian  processes. 

The  convergence  of  the  estimators  as  the  sample  size  increases  is 
assured  by  the  consistency  property.  We  may  regard  the  estimator  variance 
af  a  measure  of  the  rate  of  convergence  in  sample  size.  Specifically,  the 
variance  gives  the  average  (over  the  ensemble)  rate  of  convergence  in  mean 
-equate .  Thus,  the  monotonicity  of  the  variance  implies  a  monotonicity  of 
the  rate  of  convergence.  For  Gaussian  processes,  large  positive  correla¬ 
tions  imply  slow  convergence  and,  consequently,  poor  estimates  compared  to 
the  same  sized  samples  of  independent  data.  There  is  another  aspect  of  the 


22 


monotonicity  property  which  should  be  noted.  Thi6  is  that  the  variance  and 
covariance  of  the  empirical  distribution  function  decreases  as  the  correla¬ 
tions  become  more  negative.  Negative  correlations  are  beneficial  in  the 
sense  that  they  result  in  better  estimates  than  the  independent  case.  This 
property  of  negative  correlations  is  not  shared  by  the  amplitude  histogram. 

Although  our  focus  has  been  on  probability  estimation,  our  results 
have  consequences  for  hypothesis  testing  with  correlated  data.  This  is 
especially  true  for  hypothesis  tests  of  univariate  Gaussianity.  Weiss 
(1978)  has  devised  an  empirical  correction  formula  for  the  critical  values 
of  the  Kolmogorov-Smirnov  test  for  Gaussianity  when  there  are  correlations 
present.  Since  this  test  is  based  on  a  comparison  of  the  empirical  and 
Gaussian  distribution  functions,  we  would  expect  our  results  to  be  appli¬ 
cable.  This  modified  Kolmogorov-Smirnov  test  exhibited  very  poor  power  in 
Weiss'  simulations  of  data  dominated  by  positive  correlations,  i.e.,  poor 
compared  to  the  power  of  the  test  for  independent  variables.  Weiss  effec¬ 
tively  could  not  distinguish  between  correlated  data  generated  by  an  auto¬ 
regression  from  Gaussian  variables  from  that  generated  from  uniform  vari¬ 
ables.  Our  monotonicity  theorems  suggest  a  possible  explanation:  the 
empirical  distribution  function  of  the  Gaussian  process  cannot  be  deter¬ 
mined  as  accurately  when  there  are  positive  correlations  dominating  the 
data.  It  i6  this  inherent  inaccuracy  of  the  measurement  that  may  be 
responsible  for  this  poor  power. 
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